Goto

Collaborating Authors

 pt 1


Spectral bandits for smooth graph functions

Valko, Michal, Munos, Rémi, Kveton, Branislav, Kocák, Tomáš

arXiv.org Machine Learning

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of nodes evaluations.


Acceleration through Optimistic No-Regret Dynamics

Jun-Kun Wang, Jacob D. Abernethy

Neural Information Processing Systems

Zero-sum games can be solved using online learning dynamics, where a classical technique involves simulating two no-regret algorithms that play against each other and, afterT rounds, the average iterate is guaranteed to solve the original optimization problem with error decaying asO(logT/T). In this paper we show that the technique can be enhanced to a rate ofO(1/T2) by extending recent work [22, 25] that leverages optimistic learning to speed upequilibrium computation.


Online Adaptive Methods, Universality and Acceleration

Kfir Y. Levy, Alp Yurtsever, Volkan Cevher

Neural Information Processing Systems

Conversely, adaptive first order methods are very popular in Machine Learning, with AdaGrad, [12],beingthemostprominent methodamongthisclass. AdaGrad isanonlinelearning algorithm which adapts its learning rate using the feedback (gradients) received through the optimization process, and is known to successfully handle noisy feedback.


max

Neural Information Processing Systems

Weintroduce asimple butgeneral online learning frameworkinwhich alearner plays against an adversary in a vector-valued game that changes every round. Even though the learner'sobjectiveis not convex-concave(and so the minimax theorem does not apply), we giveasimple algorithm that can compete with the setting in which the adversary must announce their action first, with optimally diminishing regret.



c74214a3877c4d8297ac96217d5189b7-Paper.pdf

Neural Information Processing Systems

However, the resulting methods often suffer from high computational complexity which has reduced their practical applicability. For example, in the case of multiclass logistic regression, the aggregating forecaster (Foster et al. (2018)) achievesaregret ofO(log(Bn))whereas Online Newton Step achieves O(eBlog(n))obtaining adouble exponential gaininB (aboundonthenormof comparativefunctions).